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ABSTRACT 

An investigation of program exchange techniques and methods of 
evaluating such procedures is conducted. Basic hardware and system 
parameters vitally affecting program exchange are discussed. The basic 
program exchange methods covered are 1) the Complete Program Exchange 
2) the Block or Page Exchange and 3) the Completely Integrated System 
Exchange. The investigation is conducted using heuristic analysis, 
and a simulation study is conducted on selected methods. The combined 
analysis not only evaluates present methods but provides a guide for 
evaluating and selecting an Exchange technique for any system configura- 
tion. A complete multiprogramming system simulator and a specific 
technique for status preservation are presented as Appendices. 

The author wishes to express his appreciation to Mr. Jules I. 
Schwartz and the members of the ARPA Time-Sharing Project, System 
Development Corporation, Santa Monica, California for their assistance 
and to Professor Mitchell L. Cotton for his invaluable advice and en- 


couragement during this investigation. 
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1 Introduction 

Recent technological advances in high speed logic, digital data 
transmission, random access mass storage devices and related fields 
have freed the system designer from many former constraints. The 
resulting complex large scale systems will be able to operate ina 
practical and efficient manner only through the use of a technique 
such as multiprogramming. The term, multiprogramming, is applicable 
primarily to the computer with a single processing unit and is defined 
as the execution of several programs by the transferring of control 
among them in a controlled fashion, but where one and only one program 
is in control at any one time. (9) 

Multiprogramming, by definition, includes the concept of pro- 
viding simultaneous service to several on-line users. However, the 
majority of the work in the field is involved with concurrent type operations; 
the interleaving of various type programs to improve overall efficiency. 
A common example of this is the combination of the compute limited and 
the I/0 limited program to permit each element of the system (i.e. the 
central processor and each peripheral device) to use a greater portion 
of the operating cycle. The goal of this type of operation is the fulltime 
use of every system element. 

An area which is now receiving attention, and deserves much 
more, is that of time-sharing, or the furnishing of service to several 
on-line users. Bright and Chedleur have suggested the term "multiple 


break in operation" to more aptly describe this type of operation which 


concentrates on user service. (3) Due to inherent reaction time delays 
in man-machine communications, several on-line users can obtain vir- 
tually simultaneous service. The system can offer assistance simultane- 
ously to several programmers for on-line debugging and program modifica- 
tion in the program formulation phase; provide a control system for war 
game simulation and data retrieval; and perform various command and 
control functions. Short bursts of service are thus provided to several 
on-line users, while the batch jobs, serving as a system background, 
are only slightly degraded. This paper will approach multiprogramming 

in this service context. Accordingly, time-sharing, which has been 

used in various ways in the literature, will be defined as the essentially 
Simultaneous uSe of a central processor by several on-line uSers. 

The introduction of multiple simultaneous users into a system 
immediately creates severe control problems. A comprehensive Executive 
Control Routine (ECR) is required to exercise positive control over the 
system. The characteristics of this routine and effectiveness of its 
control will exert a dominant influence on the overall system efficiency. 

One of the most difficult problems faced by the Executive Control 
Routine is program exchange. To permit several simultaneous users in 
the system, an efficient method must be devised to exchange programs 
while retaining all status information and program environment. This 
information must be readily available for use when restoring a previously 
terminated program. As control is passed from one program to another, 


the loading cycle must include the resetting of the environment and 
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status. This paper will investigate and analyze the various aspects 

of the Exchange problem and outline the general hardware and software 
requirements. Particular characteristics will be noted and a general 
analytic procedure to evaluate various exchange methods developed. 

A System Simulator will be used where applicable to test the effectiveness 
of various exchange techniques on overall system performance. The final 
result will not only furnish a summary of exchange methods available, 

but provide a general analytic technique using both heuristic and 


simulation methods to evaluate future exchange methods. 





22 Background 

Several widely diverse interest groups can benefit from a time- 
sharing system. Each, individually, would impose slightly differing 
timing and control problems, but these could be accommodated relatively 
easily in a general system. The technical knowledge of each group will 
vary widely, and the system must retain the capability of serving the 
most experienced and well trained user while providing service to the 
neophyte. The following paragraphs will describe several areas where 
time-sharing could be utilized to good advantage. Some are current 
problems that are presently handled by other means, while others are 
uses that would become economically feasible through the uSe of an 
approach such as time-sharing. 

The programmer's debugging problem is too time consuming to 
permit individual use of a large computer. Time-sharing, however, 
provides virtually instantaneous computer reaction time and allows 
others to operate while the programmer is interpreting results. The 
excessive turn-around time experienced in closed shop batch job 
operation is thus avoided, or at least, considerably reduced. Another 
desirable feature oriented toward the engineer/programmer is the ability 
to make repeated runs, changing parameters on the basis of preceding 
runs without closed shop delays. This is easily accomplished under 
time-sharing. 

Real time operations fall into the purview of the time-sharing 


system. In control and monitoring applications, where the demands 





for service are normally absolute, the system can easily become 
saturated, and extreme care must be taken when including this type of 
activity in an operating system. 

On-line data retrieval appears to be the most promising of the 
future applications of time-sharing. A user will have access, from a 
remote station, to a large data base. This will not only allow faster 
handling of tasks that were formerly done manually, but encourage the 
use of data retrieval to enable more informed decisions to be made. 
The ability to provide such a service to a multitude of users makes it 
economically feasible for all. Banking uses and airline reservation 


systems fall into this general system. 





on The Executive Control Routine 

All multiprogramming and multiprocessing systems from the 
basic stack job processor to the most complex on-line system depend 
on an Executive Control Routine for their effectiveness. The increasing 
interest in multiprogramming systems has been reflected in the greater 
emphasis being placed on the general subject of control routines The 
primary requirements of the Executive Control Routine are reliability and 
effectiveness. The user must be able to assume that the system program 
is error free and virtually fool proof, and that long periods of service can 
be expected. This implies that the Executive has the capability of 
recovering from both object program and machine errors. 

Though terminology may differ, the Executive Control Routine con- 
sists of five basic parts. The Interrupt Handler passes control to various 
parts of the Executive and establishes the initial flow. The Scheduler 
determines what programs are to be run, in what order, and for how 
long. The Dispatcher handles all I/0 transfers, and the Sequencer 
establishes the normal flow through the entire Executive. The Exchange 
Routine uses the information from the Scheduler as inputs to allocate 
internal and external storage and initiate program transfers through the 
Dispatcher. Other functions of the Executive include Rollback and 
Recovery, interpretive packages and, if provided, debugging routines. 
The Executive must, in general terms, perform as an efficient and 
capable executive or supervisor, and the broad requirements can be 


determined without difficulty. It is when specific portions are subjected 
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to detailed investigation that the difficulties become more apparent 
The constraints and problems created by a specific system or approach 
will vitally affect the system in question, and all facets must be 
carefully considered. 

To provide a general idea of how the Executive functions, the 
sequence of operation of a typical time-sharing system will be described. 
All object programs are initially stored in a high speed random access 
store, placed there by a load command to the Program Exchange Routine. 
At the start of a basic cycle, the Scheduler determines the queue, and 
the Exchange Routine brings the required program into core at, or preferably 
before, its actual active quantum. If a storage conflict occurs, the 
previous programs are transferred to the external store as required. At 
the completion of a program's active period, whether through an early 
termination or the normal end of a quantum, the program environment 
(i.e. statue of all operational registers, etc.) is saved and, dependent 
upon the system load, the program is either saved in core or transferred 
to the external store awaiting its next turn. A basic concept that will 
be followed throughout this paper is that no user will be transferred 
from core unless a storage conflict exists. The specific conflicts will 
be determined by the exchange method as will procedures to reduce 
both the probability and the effect of such conflicts. 

This paper is primarily concerned with the Program Exchange 
portion of the Executive Routine, and it will be assumed that the 


remainder of the Executive performs its basic functions in a normal 





manner. If any deviations are required by a particular exchange 


method, they will be noted. 








4. Program Exchange 

As long as the compute speeds greatly exceed I/0 rates, as they 
do at present, the program exchange phase will remain a critical opera- 
tion. When the Scheduler determines a request is to be honored, the 
Exchange Routine determines what jobs, if any, need to be dumped to 
provide the required core space, where to load the new program, and 
saves all required information, sets memory protection limits when 
applicable and handles relocation if provided in the system. The 
Exchanger also handles space allocation and maintenance in both core 
and in external storage devices. 

4.1 General Considerations 

The effectiveness of the exchange technique will be determined 
to a great extent by the hardware configuration of the system. Memory 
protection is of vital importance, and it should be emphasized that 
without this feature the integrity of even the Executive Routine cannot 
be guaranteed. Relocatibility is both a hardware and a software feature. 
Lack of this capability seriously hampers the system in that dynamic 
space allocation is virtually impossible and space is essentially 
allocated by the compiler. 

Although several of the exchange algorithms to be discussed are 
hardware limited, some can be used to overcome hardware deficiencies 
and improve the system. The system itself determines to a great extent 
the type of swapping used, and core utilization must be carefully weighed 


against overhead time required. In the analysis of the exchange algorithms, 





the emphasis will be placed on hardware requirements, overhead and 
core utilization with the aim of providing the user with the most effi- 
cient service. The basic approaches will be handled first, and then 
various combinations investigated to find the most effective methods of 
solving the swap problem. 

The normal exchange methods can be divided into two basic groups, 
those that swap entire programs, and those that handle blocks or "pages", 
the blocks or "pages" being defined as data blocks less than program 
size in length. Each method has its particular advantages and, although 
complete program swaps have been favored in the past, the increase in 
the number of users in a time-sharing system coupled with the reaction 
time required and the increasing size of programs calls for a critical 
re-evaluation of both methods. Many of the problems are common to 
several techniques and will be discussed in detail when they first occur 
and only mentioned in subsequent methods. 

4.2 Exchange Techniques and System Capacity 

The reaction time of the system is one of the basic parameters of 
a time-sharing system. The decision as to reaction time will vitally 
effect both the user and the system. A selection of too small a reaction 
time (t,) will seriously decrease the number of active users serviced 
during a cycle while too long a period will result in disgruntled users. 
Although the users are receiving better service than could be expected 
in a closed shop operation, personal observation has shown that the 


addition of even a few seconds delay will tend to cause general 
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discontent among users. A second basic system parameter is the quantum 
time or the time alloted to a specific user during a given cycle. The 
determination of quanta is treated in detail by Lt. W. G. Wilder (31). 
The Executive Control Routine establishes an operating queue and passes 
control to the proper programs. The only delay in the passing of control 
is the non-availability of the program in core. Due to the length of 
transfer times relative to compute times, the basic exchange time (t) 
will normally exceed the quantum and will become the limiting factor in 
the number of users serviced. In the normal passing of control, the 
program in core is dumped, the required program loaded and run for the 
quantum. Thus (2t. + q) is required for each user and the maximum 
number of active users per cycle is given by: 

eats Uewal2teea) 
This is based on the complete swap approach when, due to hardware 
limitations and/or program size, only one program can exist in the core 
at any one time. These constraints will be discussed more fully in the 
following sections. 

If programs are relocatable, the two users may exist in core 

nest (1- fine? ain 
and q may be as great as 2t . with no degradation of service. The process 
is illustrated in Fig. 1 which starts at an arbitrary instant with program A, 


running. 


ll 





Holds for 
q @2t. 
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A BASIC SPACE SHARING SCHEME 
FIGURE 1] 


It would appear that aa can be further increased by allowing more 

users in ‘core and more simultaneous transfers, and this is true However, 
this requires multiple high speed 1/0 channels, and this, in itself, pre- 
sents new problems. 

The analysis of particular methods of program exchange is intended 
to both delineate the capabilities and limitations of the individual pro- 
cedures and also to demonstrate a general approach to the exchange 
problem. As new hardware developments permit new techniques for 
program exchange, these can be analyzed in a similar manner and 
evaluated against the same standards. Before treating individual methods 
of exchange, a discussion of generalized techniques Such as space 
allocation, memory protection, relocation and random access Storage 
mediums will provide a general background and permit certain vital 


hardware capabilities to be noted without repeated development. 


Pz 





4.3 Space Allocation 

The preceding discussion of the effect of exchange methods on 
system performance also raised the problem of space sharing or space 
allocation. The examples used, considered extreme cases, whereas 
this section will treat the general allocation problem. Space allocation 
methods and program exchange techniques are closely interwoven, and 
the complete specification of one virtually determines the other. Because 
of this interdependence, space allocation is of vital importance in both 
the complete program and the block exchange concepts. The allocation 
scheme may range in complexity from the basic, one user approach, 
to full packing and even to the interleaving of instructions found in 
languages such as LISP and IPL. The basic ideas presented will be 
applied in various exchange methods, and the specific analysis will 
show the details involved and enumerate the hardware requirements 
that are imposed. 

The graphic representation if Fig. 2 will be used to illustrate 
the dynamic changes in core storage during a typical interval of each 
of the basic allocation types. Due to the multiprogramming emphasis 
intended, it will be assumed that the Executive remains in core, and 
the allocation methods ai be concerned only with the remaining core. 

The basic single user allocation depicted in Fig. 2a. This is 
the type normally used in a system control program such as the Fortran 
Monitor and results in (oe core utilization factors, typically 0.10. 


The space allocation in this case consists of simply assigning all 
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available core to the next user. The effect of this approach will be 
more fully explored in the analysis of the basic complete program 
exchange method. 

The dense, complex packing shown in Fig. 2b is used to best 
advantage in the system where memory is initially loaded, and then all 
programs run to completion before any exchanges are made. This type 
of multiprogramming system, typified by the Ferranti FP6000, concen- 
trates on utilization efficiency and is not concerned with on-line users. 
The complexity of packing for each load can lead to complications and 
delays if the load is frequently changed. This method does not seem 
suited to any great extent to the time-sharing system with its constantly 
changing load and the requirements for exchanges every quantum. The 
problems created by time-sharing can be seen in Fig. 2c, where, although 
some packing is attained, the overhead for the space sharing allocation 
plus the occasional dead time reduces the system efficiency. 

Fig. 2d is similar to the ideal block allocation case previously 
mentioned. The average — utilization factor lies between case a and b, 
and, as the programs approach the block size in length, the higher factor 
of b is approached. The method requires either large memories, short 
programs or program modification, but avoids many of the disadvantages 
of the other methods. For maximum system efficiency, the exchange 
time, t., should be completely overlapped by the minimum q. This 
requires that for Fig. 1, q=2ts, while Fig. 2d requires a q=t.; the 


methods do, however, require one and two high speed transfer channels 
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respectively. The constraints imposed by this requirement will be 
discussed in the development of the complete program exchange. 
4.4 External Storage Devices 
When a program exchange is made, the program being removed 
must, obviously, be placed in some external storage device. The 
types of external storage available to the system will vitally affect 
the exchange method selected and the efficiency of the exchange 
phase of the control routine. Transfer rates and access times vary 
greatly not only between types, but among classes of specific types. 
A brief description of various external storage devices will establish a 
background for future decisions and delineate possible problem areas. 
The ideal storage device is the magnetic core memory which 
provides an extremely high speed, random access storage device. 
Cost factors, however, render this impractical for the storage of 
large amounts of data. The widely varying loads to be expected ina 
multiple on-line user environment require the capability of storing the 
programs of all active users. The amount of external storage available 
for program exchange will establish one of the basic limits on the number 
of users allowed into the system at a given time. This limit is the 
maximum number of stations in use, and not the lower valued number 
of active users in a given cycle. While disc and drum storage, both 
random access devices, are the most applicable to the multiprogramming 
exchange problem, the magnetic tape system is included, due both to its 


flexibility and its low cost. These, along with some of the less common 
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storage devices, will be described in the following paragraphs. A 
summary of representative storage devices is presented in the graph 
of Figure 3 and the Table of Figure 4. 
4.4.1. Magnetic drums 

Magnetic drums provide an extremely fast random access capability 
at a lower cost than core. The magnetic tracks on the circumference of 
the drum may be read by either fixed or movable read/write heads. The 
circular track implies a maximum rotational delay of one drum rotation 
and an average time of half this amount. The movable head drum requires 
proportionally fewer heads and has a lower cost per character than the 
fixed head type. Positioning time, however, is increased from zero in 
the fixed head system to typical values of 50 to 300 msec. for various 
movable head systems. The cost vs. access time relationship can 
more easily be seen in Fig. 3 and Fig. 4. 
4.4.2 Magnetic discs 

Discs files are a natural step from the drum approach. Several 
discs mounted on a common shaft provide a lower cost per character 
than found in the drum devices. The time required to position mechani- 
cally the read/write heads plus an increased rotational delay makes the 
typical disc considerably slower than the drum. Various schemes for 
positioning the access arms are used, but all must solve difficult, but 
basically mechanical, problems. Of major importance among these 
problems are increasing the speed and accuracy of positioning and 


prevention of damage to the disc surface due to physical contact with 
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the heads. Complex mechanisms have been developed to handle the 
positioning, floating heads maintained by air cushions to protect the 
discs and positive measures provided to ensure separation in the event 
of power failures. The problems, however, are not completely solved, 
and movable head discs remain chiefly as large capacity storage devices 
with the disadvantage of excessively large access times. In the early 
discs devices the arms moved in unison, while newer units permit 
individual movement. A typical late model is the Data Products Discs 
being installed in the Q-32 Time-Sharing System, in which the access 
arms are mechanically independent, although, at present, only one 

arm may be positioned at a time. Careful allocation may decrease the 
effective access time, although the apparent mean will not decrease. 
In the Data Products unit, the access arm consists of eight heads; 

four above and four below which read or write in a simultaneous Serial- 
parallel mode. The rotational delay encountered in the drum is also | 
present in the disc storage unit. The movable head disc does provide 
an excellent random access storage for large data bases that would 
saturate the practical drum system. 

A recent development in mass storage devices is the fixed head 
disc which may be thought of as a three dimensional drum storage 
surface. Access times are comparable to fixed head drums, in the 
order of 20 milliseconds. Due to the large number of characters handled 
through the basic control unit, the cost per unit character is comparative 


with high speed drums of a lower capacity. In the commercial unit, 


20 





the Burroughs B472, the transfer rate is quite slow, 100K characters 
per second. This results in transfer times almost an order of magnitude 
higher than those commonly available in drums. Undoubtedly, serial- 
parallel transfers would increase the transfer rate, but this would also 
increase the cost to the point where it equalled at least that of high 
speed drums. The fixed head disc should be thought of as the logical 
successor to the high speed fixed head drum, rather than a major 
breakthrough. 
4.4.3 Magnetic tapes 

The lowest cost device with the greatest capacity is still magnetic 
tape. The chief disadvantage, which proves to be decisive in most 
cases, is the lack of random access. Records are available only 
sequentially, and intolerable delays will result if an appreciable tape 
searching is required. The high density tape is capable of speed in the 
vicinity of 62.5K characters/sec. If the tape does not require positioning, 
access times are in the order of five to thirty milliseconds, which is 
comparable to other types. However, unless only two programs are active 
repeatedly, searches are required and access time in the range of tens of 
seconds encountered. This would obviously degrade a system to the 
point of almost uselessness. Tape does still provide an excellent 
storage for programs that are not active but need to be retained in the 
system. An example of this is the handling of disc overflows, where 
programs Selected by some aging criteria are transferred to tape when 


active users require more disc capacity than is available. Tape, 
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therefore, appears useful primarily as a backup, or secondary, store 
and should be considered as a primary store only if the other types 
of more effective storage are not available, and/or severely reduced 
system performance is acceptable in light of the cost reductions. 
4.4.4 Cartridge units 

A promising storage device is the cartridge unit with replaceable 
sections which present a combination of the random access disc and 
the large volume characteristics of tape. Large delays are encountered 
when non-loaded cartridges are required, but this appears to be a method 
of handling disc overflows that should not be overlooked. Rather than 
transfer selected programs to tape as space is required, cartridges would 
be switched and retained for future use. There are problems in implementa- 
tion, but the approach deserves consideration and careful observation to 
allow inclusion into the multiprogramming system when performance 
warrants it. If positioning times can be reduced, this device will be 
able to compete with discs in the large volume storage area. 
4.4.5 Woven screen 

The woven screen memory concept now under eeu sieaans is the 
multiprogramming storage device of the future and would solve many 
of the program exchange problems. A large capacity high speed random 
access storage device would be available at a relatively reasonable 
cost estimated at only about three or four times that of a high speed 
drum. It appears that the reduction in swap time, if any is required 


at all, will provide a sufficient time saving to justify the use of such 
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a storage device in terms of the great increase in the number of active 
users permitted. 
4.4.6 Comparison of storage devices 

The characteristics of representative units of the various storage 
device types are shown in Figure 4. The woven screen memory is 
included only as a forecast and will not enter into future discussions. 
It should be noted that the transfer rates of discs are only two to four 
times that of drums, while the access times are at least one order of 
magnitude greater. If schemes can be found to reduce the effective 
access time, discs could compete favorably with drums. The ability 
to issue a "look ahead" command followed by reading provides one 
scheme and will be fully explored. The higher cost per character of a 
drum requires that the size required be carefully determined. Fig. 3 
provides a graphic presentation of the primary areas to which each 
storage device applies in terms of cost, typical sizes, and access 
time. These several descriptions provide an idea of the advantages 
both practical and economic which accrue from the proper combination 
of external random access storage devices. The basic characteristics 
described will be applied in any determination of the optimum exchange 
methods. 
4.5 Memory Protection 

Any practical multiprogramming system should provide a memory 
protection capability. During active periods, a minimum of two uSers, 


the Executive and one object program, must coexist in memory. If the 
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system is to function in a reliable fashion, the status of the Executive 
must be preserved unaltered. Further, to provide satisfactory service 
to the users, they should be protected from each other. 

Two basic degrees of protection are readily apparent. The first, 
malfunction protection, is absolute in nature, encompassing both hard- 
ware failures and interprogram interference and appears virtually im- 
possible. A comprehensive automatic recovery program such as the 
IBM FIX used in the Q -32 can overcome many hardware failures but 
requires extensive overhead and adds immensely to system complexity. 
This section will be concerned only with the second type of protection, 
interprogram protection, and the methods of attaining protection in an 
effective but flexible mannner. 

Before deciding on a method, the degree of protection must be 
established. The Executive Routine must have access to all areas, 
whereas object programs should be either read or write protected or 
both, and illegal entries (jumps) should be prevented. Although it does 
not concern the Exchange problem, the 1/0 Dispatcher must prohibit 
write operations in forbidden (Executive) or inassigned areas of both 
external storage and core. 

The memory protection method and its effectiveness are very 
important parts in determining the overall system performance. An 
uncontrolled program can not only damage itself, but other users and 
the Executive routine which renders the system inoperative. Inas 


much as the memory protection provided affects the program exchange 
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method used, it also exerts an influence on the number of users per- 
mitted. This section will describe several memory protection schemes 
in detail. Later sections will demonstrate the influence of memory 
protection on various exchange methods. 

Due to the high frequency of core references in any program, 
protection should be implemented by hardware rather than software. 

The following list of characteristics is a modification of a list proposed 
by E.A. Codd (4) and represents the primary areas with which a memory 
protection scheme should concern itself. 

1. Resolution - The smallest block which can be protected. 
Flexibility in determining block size should also be considered. 

2. Adjacency - Can non-adjacent areas be simultaneously 
protected? What are the limits on this multiple protection feature? 

3. Types of protection - Are data handling and operational 
violations handled differently? What combination of read and/or write 
protection is afforded? 

4. Performance - What penalty, if any, is paid for protection in 
total system performance? 

5. Treatment of potential violations - How are violations handled? 

To provide a useful service to the user aS well as a safeguard to 
the system, potential violations should be trapped and an interrupt jump 
to an error routine initiated. The user should be informed, as specifically 
as is compatable with allowable overhead constraints, of the nature of 


the violation. This general treatment of handling violations will suffice 
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for this section, and attention will be devoted to studying the ways 
various memory protection schemes are implemented and how effectively 
they perform. 

The Bounds Register approach is one of the most useful protection 
schemes. It is controlled by the Executive which sets limit registers to 
bound the area available to an object program. Hardware comparison is 
made to these registers automatically and simultaneously for each 
instruction requiring memory access. This technique is used on both 
the IBM Stretch and 7090, the CDC 6600, and the RCA 601. The RCA 
601 also utilizes the lower limit register in conjunction with object 
program addresses to provide relocation on loading. This method is 
extremely flexible in both size and location, and multiple bounds 
registers may provide protection to several non-adjacent areas. Another 
addition to the hardware permits selection of protection of the area either 
inside or outside of the bounded area. The disadvantages are the large 
amount of logic circuitry required and the timing problems involved in 
comparison. 

A modified bounds register is used in the IBM 7040. In this 
approach, the lower boundary of protection is loaded into a 9 bit register, 
which is then compared against the higher order bits of the effective 
address to determine if the number is equal or unequal. Another register 
defines how many of the higher order bits will be compared. The value 
in this count register determines whether the protected area enclosed, 


starting from the base address, is 1 K, 2K, up to 64 K. An additional 
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feature is the option of specifying the legality of the unequal or equal 
condition. One choice, the illegal inequality, protects the area outside 
the defined area. Conversely, the equality being illegal protects the 
area inside the boundary. Less hardware is required by this method than 
in the complete boundary register method, and timing considerations are 
not as stringent. Due to the bit setting nature of the approach, the size 
of the protected areas is is 2" K increments. 

The mask register method of protection is similar to limit register 
approach. The mask register has one bit for each memory block, and the 
setting of any bit establishes a protected area. Fewer hardware complexi- 
ties and timing problems areise, as the method requires only one bit per 
block. The Executive can maintain a table of memory areas, or they can 
be carried with the object programs, and each change it controls will 
be preceded by a resetting of the protection mask. The system provides 
for any combination of blocks desired, but the size of the blocks is an 
inflexible hardware parameter. 

The Ferranti Leo III uses a mask type of protection but applies it 
to each individual storage location in the form of a tag. When a program 
is entered into the memory, a tag identification is set on all locations 
available to the program. Object programs may use only those locations 
tagged for its use. Certain areas available to several programs are 
given special tags. This method is effective in the environment where 
several programs are placed in core and remain there until completion. 


Non-adjacent areas are protected, and any size protected area is 
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selectable. However, the method becomes consuming for applications 
where frequent exchanges are encountered and is generally unsuitable 
for time-sharing applications. 

The Q-32 uses a mask type of protection scheme. Each of the four 
logically separate 16K memory banks has a control flip-flop to establish 
a protected area. The primary disadvantage is the coarse, 16K resolution. 
This will be improved by a modification to provide mask registers for each 
block. Protection will be available in contiguous increments of 2K with 
no non-adjacency provisions. 

The next major approach is hardware lockout. In the Atlas applica- 
tion, object programs operate in a different mode than the Executive and 
cannot generate addresses addressing a reStricted portion of memory. If, 
due to hardware errors, illegal addresses occur, interrupt transfers to 
the error routine are made. 

Fixed or "read only" memories provide a rigid protection which 
requires little overhead or parallel logic. Deposited capacitors or 
inductive arrays are pre-set and cannot easily be altered. Compactness 
and low power requirements enable high switching speeds to be attained. 
This provides an excellent storage for the Executive but cannot even be 
considered for interprogram protection. Due to the difficulty in changing 
the storage, it is of doubtful value for even the Executive. 

The last program protection method is a software approach. This 
is used in some debugging routines when no other methods are available, 


but it is so time consuming as to preclude its use in most other areas. 
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The most practical protection scheme from the Program Exchange 
viewpoint is a modified mask approach. The requirements for protection 
of blocks of under 1K in size cannot be justified in the practical case. 
The hardware costs, overhead impose, and circuit complexity far out- 
weigh any increase in capability. The masks would be re-set each time 
control is shifted. The containment of the program in control provides 
the desired protection, and no further non-adjacency requirement is 
apparent. This is by no means a complete solution, but adequately 
defines a practical method which satisfies the general requirements of 
memory protection for the purposes of Program Exchange methods. 

4.6 Relocation 

The term relocatability can be defined as the independence of 
the object program from the constraint of occupying a specific area in 
memory. Relocatability provides a technique for improving the efficiency 
of program exchange schemes over the basic run and reload method. Core 
space allocation, which helps provide the desired improvement, is 
realizable only with relocatability. In most multiprogramming systems 
neither the system nor the program will know what area the object program 
will occupy at run time nor should they be required to. Indeed, due to 
exchanges, a program may occupy several areas during its execution. 
Combinations of hardware and software are best suited to providing the 
desired relocatability. 

The most common method of program relocation uses a base register 


or pseudo base register. At first inspection, compilation to a base 
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address of zero with modification in accordance with a base register at 
load time seems to solve the problem simply and effectively. In principle, 
it does, but, upon aleRe investigation, problem areas are revealed. 

The various instructions treat the address portion ina great variety of 
ways, and no general modification scheme is apparent. Hence, each 
instruction must be inspected and then modified in the proper way using 
the base register. If a "relocation" bit, which signals the addition of 

the base register, is provided, much of the hardware complexity is 
avoided. This bit can be either part of the instruction itself or entered in 
the header of the binary record used to control the load. 

Another approach is a basic system scheme which , incidently, 
provides relocation. This is page turning, developed to a great extent 
in the Atlas system. Pages of a fixed size (512 words in the Atlas) are 
transferred, and a page address register containing the most significant 
bits of the core address of the page loaded is associated with each page 
read into memory. Hardware comparisons of the required page address 
and the available page addresses are made to determine if the referenced 
page is located in core. If not loaded, an interrupt is generated and the 
page loaded as soon as possible. The hardware utilized for paging also 
provides memory protection. The program relocation problem is auto- 
matically handled through the page address register, and memory alloca- 
tion is simplified. The paging concept will be treated in greater detail 


when studied as a basic program exchange technique. 
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While no particular scheme is favored, it will be assumed that 


relocatability is provided by a specific exchange method when required. 
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Sik Complete Program Exchanges 

As a basic exchange method, the complete program approach has 
the chief advantage of simplicity. Programs are handled in a natural 
fashion, in their entirety, and no undue overhead is created. The entire 
program is available at all times during the quantum, and closed sub- 
routines can be utilized to reduce program size. No unusual demands 
are placed on the compiler as, at worst, only relocatibility bits are 
required. In contrast to the “page" approach, there are no inefficient 
pauses while pages are checked and required pages called, norisa 
large overhead required to keep track of the pages and their status. 
Space allocation, memory protection and relocatability provide the most 
efficient type of operation. The most serious disadvantage is the 
presence in valuable high speed core of large strings of coding which, 
although they cannot possibly be operated on in one quantum, must be 
transferred in and out of core during each active period. Several large 
users can effectively thwart space allocation and make the value of 
memory protection and relocatability marginal. A relatively small core 
and a large Executive Routine which must remain in core at all times 
impose a limit on maximum program size. This can become a major 
system constraint in an environment characterized by large programs. 
5.1 Storage Compacting 

Prior to handling specific methods, two fundamental problems 
inherent in all program exchanges will be discussed. These are drum 


compacting and preservation procedures. A running measure of the 
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external storage must be maintained to handle new arrivals. However, 
due to a previous user becoming inactive, the available storage may 
well consist of broken blocks. The basic question is then, when to 
compact or compress the storage. Attempts could be made to insert a 
new user in available areas or immediately upon ascertaining that suffi- 
cient total space exists, the storage could be compacted and the new 
user placed at the end of the active programs. Inasmuch as the system 
envisioned would be open ended as far aS uSers are concerned, it appears 
little would be gained by attempting insertion. The inspection of all open 
areas to determine if the size is sufficient would be time consuming, and 
subsequent arrivals would most probably necessitate compacting in any 
case. The general compacting philosophy will call for compacting when- 
ever it is determined that a new user can be accommodated into the system, 
and insufficient room exists after the last active program on the external 
storage device. 
5.2 Preservation of Environment 

The second problem concerns the preservation and storage of the 
program environment, the information concerning the state of the object 
programs such as operational registers, I/0 control words, etc.. This 
information must be preserved for each program exchanged and prior to 
restarting all information restored. The actual operations are to be handled 
by the Executive, but the storage location is not as simply handled. If 
storage is to be maintained in the Executive, the table capacity must 


provide for the maximum number of users. If the system is large, 
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valuable core space will be lost to preservation tables uSing this method. 
The alternative is adding storage to each program which carries the 
required information. This increase can be implemented most effectively 
by the system at load time, although the compiler could add the required 
Space. The first method permits the Executive to be re-setting the opera- 
tional registers while the program is being reloaded, but appears to be 
very wasteful of core, especially in large systems. The time required to 
load several registers will, however, be relatively small (20 or fewer 
major cycles). In view of this and the more effective use of core per- 
mitted, this method of status preservation is recommended. The general 
concept has been used in both the SDC TSS-2 and CDC 6600. A proposed 
preservation routine for the CDC 1604-160 Satellite System is presented 
in Appendix III. 
5.3 Basic Complete Program Exchange 

The fundamental complete exchange method is the run and then 
dump and reload procedure. Its lack of sophistication provides its 
greatest virtue, simplicity. No space allocation is implied, and, with 
only the Executive requiring protection and no relocatability required, 
the hardware problems are simplified. Users are loaded on the external 
storage, typically a drum, as they arrive, and the capacity of the drum 
determines the maximum physically allowable number of users. As only 
one object program is permitted in core, the maximum number of users 
for a given reaction time can be obtained from 
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which is the worst case form previously developed. Lack of concurrency 
ot io and q seriously degrades the system performance. In addition, 
during the period at. for each user, the central processor is idle, 
seriously reducing the computing efficiency. In a system where the 
number of active users per cycle, a function of both the number of on-line 


station, N and the job mix, is low, this type of exchange is valid. 


max ’ 
If the smaller Desens, is acceptable, the simplicity of the approach recom- 
mends it. This approach will serve as the basis for other complete pro- 
gram exchange methods and attention will be devoted to various means 
of improving the efficiency and overcoming the disadvantages. 
5.4 Two Level Storage 

The use of drum or fixed head discs seems natural in the basic 
case due to the transfer times, and an increase in system capability 
can be achieved by addition of a movable head disc with its large 
volume storage, albeit slower access times. In the most common usage, 
the disc replaces tape storage and speeds system operation but has no 
effect on the exchange problem. The disc, though, can serve as an 
effective secondlevel storage device. Although the actual scheduling 
will not be covered, the disc could handle large programs such as com- 
pilers, whose long run time and few man-machine delays permit a 
reduction in response time. To prevent the introduction of several 
compile type jobs from degrading the system due to several repeated 
slow disc accesses, a reduced service interval could be used for 


disc programs. If only one disc program was allowed per cycle with 
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all programs on the disc receiving service from a round robin queue, 
system response would suffer only slightly and virtually no delay would 
be noticed. Using this approach, the compute limited jobs would receive 
service while not overloading the system drums’ This type of treatment 
could also be applied to slow background or batch type jobs. A limit 
should be placed on even this Service, as the primary function of the 
disc is to overcome tape deficiencies. The secondary object program 
storage should not be permitted to curtail this to any great extent. The 
overhead involved can be justified in view of the increased Service 
offered by the system and the increased storage made available on 

the drum for generally smaller, fast reaction time programs. 

9.9 The Disc "Look Ahead" 

In a drum and movable head disc system, the largest delay 
encountered is in positioning the disc heads. Reduction of effective 
access time increases the efficiency of the concept. Means do exist 
through "look ahead" features to reduce this time. Discs are available 
which permit the issuing of position commands only, and the transfer 
operation is ordered only after the heads are positioned. Considerable 
time could be saved if the positioning could occur concurrently with the 
running of another program. Then, when the present gq was completed ; 
the next program would be ready for loading. In current units this 
approach has the serious disadvantage of precluding the simultaneous 
use of the disc, which was basically to replace tapes, by an object 


program. The increase in system capability offered by use of the "look 
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ahead" must be carefully weighed against the lack of disc reference 
for short periods. The true effect on object programs involves the 
theory of program structure which is beyond the scope of this paper, 
but the handling by use of a WAIT command is a method. The WAIT 
reply, however, can cause one or more programs to loose their entire 
useful quantum, and, due to their remaining active until serviced, 
probability considerations show that the situation is likely to deteriorate 
with no one receiving good service. For this reason, the method is not 
recommended for general usage. The alternative is the hardware capa- 
bility of independently positionable movable heads or fixed heads. 
Either of these add considerable cost to the system but provide a 
powerful capability. If the economic considerations warrant it, one 
of these features should be included. 
5.6 Complete Program Exchange with Space Allocation 

The three previous techniques, while providing increasingly 
better service, are still constrained by the 

Rmax = trl%)/ 2tg+a 

equation. Space allocation or space sharing of core provides the most 
practical method of improving this user factor. The concurrency of 
exchange and run operations permits the reduction of effective swap 
time, and permits either more users or better service to the same 
number of users. As previously developed, this method requires the 


hardware for relocation and memory protection. 
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Salou L Hardware constraints 

Concurrency of exchange and run modes will focus attention 
on the high speed channels, and some introductory material is in 
order. The subject of high speed transfer channels presents some 
interesting paradoxes. High speed is generally construed to mean at 
a speed comparable to one memory cycle. Due to the read/write nature 
of memory references, it can be seen that two high speed channels can- 
not operate simultaneously in a given core unit. The slowing down of 
the channels and inter-leaving of operations does not improve speed, 
but only ensures that each job will require ate , and that they will be 
completed at about the same time. An alternative is the use of multiple 
logically independent memory banks. A separate high speed line can 
be provided for each bank and a simple stepping mechanism used in 
the memory control unit to allow large programs to utilize several banks. 
The method also facilitates memory protection, especially at the block 
level. Fig. 5a and 5b depict the basic configurations. While the second 
has a higher cost due to both the more complex control unit and the mul- 
tiple high speed lines, its advantages are attractive. 
9.6.2 Hardware determined methods 

The extent to which space allocation is implemented depends upon 
the number of simultaneous transfer operations permitted, a hardware 
determined parameter. If the three operations - run, load and save - 
procede concurrently, a smaller effective exchange time results if the 


mean program size is one third the available memory or less. A quantum 
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at least as long as the time to transfer one third of memory is pre- 
supposed. If only one transfer operation is provided the acceptable 
mean program size increases to one half the available memory, and 
the minimum quantum becomes the time to transfer the entire core. 
The quantum constraints are introduced to reconcile the general timing 
problem. If very short quanta are allowed, it is impossible to complete 
the exchange concurrently with the run operation, and little improvement, 
if any, can be realized regardless of the exchange method. The quantum 
must be as long as the time required to perform the complete exchange of 
programs of mean Size. 
S643 Space-sharing exchanges 

The use of space-sharing in the full program concept does not 
assure increased effectiveness since the allocation algorithm increases 
the overhead. If the mean system program size is a large fraction of the 
total available core, transfers will be required at almost every quanta. 
The method will then proceed similarly to the basic run and reload method, 
but will require the extra overhead to check for space-sharing possibilities. 
A similar condition can result if many users are present in the system and 
the size of the available core is small. If the mean program size is 
greater than the average allowed per user, the system becomes saturated 
and little is gained from attempted space-allocation. The latter problem 
can be overcome if concurrent transfers are permitted but should be 
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carefully considered if this concurrent capability is not provided. 
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The use of space allocation in conjunction with both the disc 
and drum storage provides the maximum complete program exchange 
efficiency. This maximum efficiency requires several conditions be 
satisified. First, the mean program size for drum users must be less 
than half the available core memory. Second, hardware must exist to 
provide at least one level of concurrency of exchange and run modes. 
The previously mentioned requirement of relocatability must be met, 
and memory protection of some sort is advisable. The disc would be 
used to store a limited number of large programs, scheduled in a 
Slower manner than the drum users. Failure to satisy any of these 
requirements will detract from the method, and a thorough study should 
then be made to see if a less complex method would not provide as 


efficient overall service. 
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6. The Block Exchange 

The block exchange technique provides several extremely useful 
features to the time-sharing system. Block exchanges permit many 
users to reside in core concurrently and dependent on the program mix, 
several of the transfers required by the complete program approach are 
avoided. This reduces the average exchange time, te , and as shown 
previously, increases the upper limit on the number of active users 
permitted. If additional blocks are required, the exchange time is, 
at worst, no greater than that required for the entire program to be trans- 
ferred. The penalty paid is in increased overhead which must be care- 
fully controlled. Use of the proper algorithm minimizes "page" turnings, 
or the number of times a block is dumped and then reloaded. A further 
advantage of the block exchange method is that the programmer is 
virtually freed from the constraint of particular machine memory size. 
Programs written on a general Symbolic Language can be compiled with 
any applicable compiler without regard to memory size. Due to the 
relatively small size of the blocks, space allocation is more easily 
implemented than in the complete program exchange methods. 

The block concept greatly increases the complexity of the compiler, 
as the blocks must be assembled and references to other blocks treated 
in a distinctive manner. Some systems use hardware design to simplify 
these problems, and those without the hardware capability must rely on 
the time consuming software methods. One of the most obvious ways 


to simplify the compiler is the liberal use of open subroutines which 
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reduce the number of program jumps. This greatly increases program 
size but, as this is no problem in the block method, it can be accepted. 
If references are made to blocks not in core, pauses, which degrade the 
particular program and decrease system reaction time result. In most 
applications, complex checking routines are also required to first 
determine whether or not a reference is in the same block, and, 
secondly, if it is in another block, to determine if the required block 
is in core and where. If this is not the case, the location of the block 
on the external store must be determined, and the block loaded. This 
latter, three phase problem, basically the excessive time and space 
required for maintenance and checking, is one of the most serious dis- 
advantages of the block exchange. A second major disadvantage is the 
difficulty of access to large data bases. Successive references to 
widely scattered data entries generate numerous block calls which 
result in losses in efficiency. 
6.1 The Page Exchange Method 

One of the earliest uses of the block exchange method was in 
the Ferranti Atlas system /\7 ; 18/. The implementation in this case 
is extremely hardware oriented, and the discussion of the concept 
has value not solely as a particular method but further as an example 
of the capabilities that can be achieved by hardware design. It is in- 
dicative of the features that will be further developed in the next 
generation of large scale systems, and less sophisticated systems 


will attempt to obtain the same advantages through software. The 
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system solution to the space allocation problem will also be discussed. 
Although a complete explanation of the entire system will not be 
attempted, sufficient detail will be included to provide a background 
for the exchange method. 

The storage hierarchy of the Atlas is rather unusual, and the 
exchange method derives much of its power from this storage approach. 
The primary storage consists of normal ferrite core, a high speed drum 
(2 msec/block) and an unusual storage termed the fixed store. The 
fixed store contains several thousand words of fast {300 ns) storage 
and consists of a woven wire mesh with small ferrite plugs inserted 
into the spaces. The presence or absence of a plug determines the 
state of the store. The fixed store holds complicated functions which 
permit both a simplification of the arithmetic unit and inclusion into 
the instruction code of complex instructions such as vector operations 
and polynomial evaluation. The core is divided into stacks of 4096 
words, each with an independent access to the central processing unit. 
The stacks are interleaved in pairs, odd and even, and instructions 
drawn from the store in pairs. Transfers between drum and core are 
direct, in 512 word blocks, and, once initiated, proceed autonomously. 

From an exchange viewpoint, the most valuable feature in the 
Atlas structure is the one level store concept. The core and drum are 
considered as a single large memory (maximum size 10° words) and can 
be addressed as such. In the following discussion some difference 


between Atlas terminology and normal usage requires clarification. 
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The Atlas system defines a block as 512 words of information and a 512 
word memory unit as a "page" in core and a sector on the drum. Further, 
the term address refers to the identifier of a required piece of information 
and does not necessarily provide its location. The 20 bit address (Fig. 
6) consists of 11 bits providing the block address and 8 bits providing 
the location of the word within the block. Each page of core has asso- 
ciated with it a 12 bit page address register, 11 bits of which identify 
the block contained. When a storage reference is made, a parallel com- 
parison of the page address registers is made, and non-equivalence 
indicates the block is not in core memory. A Suitable interrupt is then 
generated, and the Executive, termed the Supervisor in the Atlas system, 
initiates the required transfer. This frequent, complex checking requires 
involved system tables which are maintained in a subsidiary store acces- 
sible only by the Executive. The block directory contains an entry for 
each block in the one level store consisting of the block number, n, and 
the location of the page or sector occupied by that block. Each object 
program is assigned a distinct area, and a Separate program directory 
defines the areas occupied by each program. The number of blocks re- 
quired is one of the job input parameters. 

During some operations such as drum or tape transfers, a page 
cannot be moved or accessed by an object program. To protect such 
pages a_ lockout bit is provided in the page address register. Reference 
to a protected page causes an interrupt to be generated, and the Executive 


takes the proper action. The capability further provides the Executive 
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with a flexible memory protection technique and allows protection to 
be shifted easily as the controlling program is changed. 

The Executive insures that the core always contains an empty 
page. When a non-equivalence interrupt is received, the Executive 
locates the required sector using the block directory and initiates a 
transfer to the empty page location. While this transfer is proceeding, 
another page is selected for transfer to the drum to fulfill the empty 
page requirement. The selection of the page to be transferred is made 
by an adaptive type program which predicts, learning through its errors, 
the page that will not be required for the longest time. Various types 
of adaptive programs to accomplish this selection are described in 
Section 6.2. When the initial transfer to core has been completed, an 
interrupt transfers control to the Executive which updates the block 
directory and program address register. The transfer from core to pro- 
vide the empty page is then initiated, and, upon completion, the block 
directory is updated and the location of the empty page noted. 

These storage and exchange techniques combined with other 
interesting hardware features provide the Atlas with an excellent multi- 
programming capability. Representative drum times show access and 
transfer times of 6 and 2 milliseconds respectively. Due to the one 
level store and the reference by block address and then by location 
within the block, relocatability is not required as it is provided 
implicitly by the storage method. The program directory reduces the 
number of entries in the block directory that need to be checked for each 
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non-equivalence interrupt to determine the block location and, hence, 
effectively reduces overhead. The one level store and allocation scheme 
provides a definite advantage to the programmer in that program can be 
written without regard to the machine on which they will be run, especially 
as far as size is concerned. An interesting effect of the adaptive page 
selection routine applies to large multipurpose programs. If, fora 
specific application, only a few blocks are needed, they will exist in 
core with a high probability, and the unused portions will remain on the 
drum and not require exchange time. The system efficiency would be 
improved if the transfers to and from core could proceed concurrently. 
But the interleaving of the logically separate memory units, rather than 
the sequential treatment, precludes this, even if the additional high 
speed channel were available. 
6.2  Pageturning Algorithms 

The use of adaptive learning programs to select pages _ for replace- 
ment in core is based on the fundamental assumption that a good correla- 
tion exists between the previous activity of a particular page and its 
future usage. The determination is not a trivial problem and depends 
on a Study of the program structure of the system in question. The arri- 
val times and, hence, usage of core space of programs that require a 
short, one or two, quantum, burst of service followed by a long latent 
period awaiting human response are extremely difficult to predict. As 
service periods increase, however, prediction becomes feasible. Any 


so called page turning algorithm will increase overhead due to the 
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time required to record data and make computations. The probable 
effect must be carefully determined to ascertain that the overhead is 
justified. 
bez. | Two level checking 

In a short service environment, a two level page check would 
probably be satisfactory. First, a check could be made to determine if 
any pages in core are for users who are not active during the present 
cycle or, secondarily, have already received service during the cycle. 
If any such pages are encountered, as many as required could be exchanged. 
In the event that all pages belong to the remaining active users for the 
cycle, an arbitrary exchange of a random page would require less over- 
head and probably be as effective as any other determination. 
ove 2 Adaptive algorithms 

As soon aS any uSer or users require service for several successive 
quanta, an adaptive page selection algorithm becomes worthwhile. The 
algorithm attempts to minimize the turnings per unit time, or the number 
of times a page is required to be reloaded after being transferred out 
of memory. As pointed out previously, a poor choice, will, at worst, 
result in an exchange time equal to the complete program exchange time, 
a plus the overhead, while better methods improve computer efficiency 
and system performance. The number of page turnings will be the basic 
parameter rather than page accesses which are more difficult to determine 
and less indicative of problems in system performance. An algorithm is 


required that not only considers the transfers of a given page, but has 
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some method of weighting and decaying so that a page heavily used 
during a previous period does not receive undue priority in the future. 
62.3 The decay algorithm 

J]. W. Weil and J. W. Harriman have suggested methods of compu- 
ting a figure of merit for each page which is adjusted when the page is 
transferred into core and updated for other transfers while it is in core. 
The overhead required to update all figures of merit for each page turning 
would be prohibitive, and a second parameter providing an indication of 
the last update would be helpful. With this parameter set when a page 
is turned out to the drum, no further action is required until the page is 
returned to core. 

The basic algorithm proposed by J. W. Weil uses the following 


equation to compute figure of merit: 


=. ~(t-t,)2 
Fmi = new Figure of Merit 


= last time Fwas updated 
= present time 

decay factor 

1 if page turned in 

0 if page in core 

new page emphasis factor 


2 exc S 
II 


The time parameter could use the real time clock, but a greater sensitivity 
will result if the unit of t is a page turning, and t reduces in effect toa 


turning counter. The previous equations are reduced to: 


0 ae XR 

aor = Eat for pages in core 
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The values of FL, and n, are Stored either in a table in the page 
itself or in the status and environment storage for the program. Whena 
turning is required, the page with the lowest aa is transferred and all 
other F ai updated. 

J]. W. Harriman uses accesses toa "page" to update it rather than 
page turnings. This requires hardware sensing of an access different 
from a preceding access (regardless of whether the new page is in core). 
The individual accesses are used for the n and n, of the previous equa- 
tions. The variation requires a greater hardware capability, including 
the ability to differentiate between instruction and data accesses but 
results in an extremely fast adaptation and an efficient paging algorithm. 
6.2.4 Probability allocation 

Probability allocation provides another method of increasing system 
efficiency by reducing the exchanges required. It has its greatest value 
when the system requirements can be defined to some extent. A typical 
example presented by B. N. Riskin deals with a data processing and 
multiprogramming system.- The programs to be operated are known, and 
all possible combinations in a cycle can be tabulated along with asso- 
ciated probabilities. The process then evolves into one in which storage 
is allocated to a particular table or program in such a manner that other 
possible users occur with a lower probability. The fact that some pro- 
grams or program parts must coexist for a cycle is also considered in 
space allocation. In a general time-sharing system where the program 
mix for a given cycle is extremely variable, the method does not seem 


practical. 
Sue 





6.3 The Pseudo Block Exchange 

The pseudo block exchange method loads programs in their entirety, 
as in the complete exchange method, but uses a block type approach for 
transferring programs from core. When there is insufficient space in core 
for the next user, only that portion of core required by this next user is 
transferred to the external storage. If there are only a few active users 
of widely varying sizes, a considerable saving in exchange time can be 
realized. This method was developed originally for the MIT Time- 
Sharing System in an effort to overcome the problems of slow speed 
transfers in a core and disc only system. Although other systems utilize 
high speed drums to store active programs, the method is deserving of 
general consideration. 

The overhead required to keep track of the location of various 
program blocks appears imposing at first, but reduces to a simple two 
entry table for each user. The first entry would provide the complete 
program length, and the second, a measure of the portion in core or on 
the external device. The program will exist as it did upon completion 
of its last quantum, partially in core and partially in the external store. 
Fig. 7 depicts the actual variation in the stores with the status depicted 
only during running quanta with exchange phases not depicted. Itis 
conceivable that, at any given instant, parts of several programs, in 
addition to the complete program being run, would exist in core. 

To simplify transfers, exchanges will usually be handled by 


blocks rather than individual words. The method is not a block exchange 
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in the usual sense, but provides a useful composite of the two general 
techniques. It could well serve as an interim exchange method until 


larger core and external stores could be integrated into the system. 
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ig The Completely Hardware Oriented Exchange 

The final program exchange technique is quite different from both 
the basic block and the complete program approaches and was not even 
considered in the same general class. This technique is a completely 
hardware oriented system and the problems and potential of this approach 
are just now being realized. 
7.1 The CDC 6600 

The recently developed CDC 6600 incorporates several of the most 
advanced techniques of both multiprocessing and multiprogramming. The 
methods utilized show how many of the complex multiprogramming problems 
may be efficiently handled by hardware design and point out trends in 
hardware system design. The multiprocessing capability is provided by 
ten peripheral and control processors and a central processor. Multi- 
programming is handled both in the central processor and in the peripheral 
processor arithmetic unit, with the peripheral processors themselves 
acting in both cases as on-line users. The peripheral processors each 
have a small 4 K independent memory and are used to handle all I/0 
transfers and to perform simple arithmetic operations. Programs requiring 
complex or high speed operations use the central processor on a time 
shared basis. The central processor operates on peripheral requests 
only and is not assigned to a specific program. 

Before studying in detail the exchange method utilized, a brief 
description of the three major units would be of value. The characteristics 


of the central processor, the peripheral and control processors and the 
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central memory, in reality, determine the basic exchange method. Fig. 8 
provides a basic block diagram of the 6600 system. 
(alee The central processor 

The central processor relies on minimization of memory references 
to achieve a high speed. Programs to be run are stored in the central 
memory and initiated by an exchange jump instruction from a peripheral 
processor. The peripheral processors also establish upper and lower 
bounds to provide memory protection and specify the error exit to eli- 
minate Executive overhead for illegal references. Multiple arithmetic 
and instruction registers are used to minimize storage references and 
increase the overall speed. In addition, multiple banks of memory 
permit concurrent referencing. The actual processor has ten different 
arithmetic units, each designed for a specific operation such as Boolean, 
multiply, etc.. Non-dependent instruction strings are sensed and may 
proceed concurrently, while a reservation system maintains the required 
program order. Programs are formulated in the normal manner, and the 
32 instruction registers are updated frequently enough to maintain the 
program flow without pausing for memory accesses. Branch or jump 
instructions cancel the remaining instruction registers and they are 
reloaded. Central processor references to the central memory are made 
relative to the lower bound set by the peripheral processor exchange jump 
instruction and relocation is provided by simply modifying the lower 


boundary. 
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Hardware features cause the programming of the central processor 
to vary from conventional methods when examined at the machine language 
level. An example is the effect of multiple registers. The central pro- 
cessor has 8 operand registers, X O through X 7, and 8 address registers, 
AO through A 7; X 1 toX 5 read operands, while X 6 and X 7 write to 
central memory. The action is initiated simply by a change in the cor- 
responding A register. AO and X O are used as scratch or buffer areas. 
ae. 2 The peripheral and control processors 

The peripheral and control processors are independent units each 
with a separate 4 K, 12 bit word memory. The instruction set includes 
access to the other two major units, I/0 and logical operations, fixed 
point addition and subtraction and indirect addressing. The ten pro- 
cessors use conventional programming techniques, but the instructions 
are executed on a single multiplexed arithmetic unit. The instructions 
are stored in a ten position barrel which can be thought of as a fixed 
form queue. As each position reaches the arithmetic control unit or 
"slot", all or part of the instruction is executed. If the system is 
considered in the time sharing context, each processor is basically 
an on line user receiving a fixed quantum (100 ns) of service. A con- 
currency technique permits memory references to be serviced and made 
ready for service while the instruction is moving through the barrel 
awaiting its next quantum. 

All data transfers are conducted through the peripheral processors 


with an initial transfer to the peripheral memory and a subsequent 
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transfer to either the external devices or the central memory. Transfers 
to and from the central memory are conducted through read/write pyra- 
mids. The pyramids assemble and disassemble 60 bit words in 12 bit 
segments per stage. A position for 60, 48, 36, 24 and 12 bits is avail- 
able in each pyramid so that four transfers can be proceeding in each 
direction simultaneously. The actual transfers from the pyramid to the 
applicable memory are handled through the "slot". Fig. 8 demonstrates 
the basic flow in the time sharing of the peripheral arithmetic unit. 
Communications between any peripheral units and/or external devices 
are handled on 12 bidirectional 1/0 channels with a maximum transfer 
rate of 10° words (12 bits) per second. 
7 e3 The central memory 

The central memory consits of 131K words (60 bits) organized in 
32 logically separate interleaved banks. Consecutive addresses go 
into different banks permitting the rapid loading of the 32 instruction 
registers in the central processor. The address and data control 
mechanism permits a transfer rate of Woxilor words per second. A novel 
control technique is used to handle all references. A clearing house 
called the "Stunt box" receives all requests, and, under a priority 
system, passes the addresses on to all banks. The applicable bank 
accepts the request if not in use and notifies the "stunt box" which 
then initiates the transfer to the data distributor. If the bank referenced 
is in use, the request is stored in a hopper. Addresses are sent from 


the hopper, central processor and the peripheral processors in that 
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priority and repeated until accepted. Division of four banks use a common 
line to the data distributor which serves as the transfer point. 
7.2 The Exchange Jump 

An exchange jump instruction provided in the peripheral processor 
repretoire initiates program operation on the central processor. The 
instruction first generates an interrupt and transfers the program Starting 
address in the central memory to the central processor via the accumulator 
of the peripheral processor. When the interrupt is sensed, the program 
status and environment are set in the central processor's registers and 
the information from the previous program saved. A subsequent exchange 
jump instruction referencing the block stored returns the interrupted 
program to the central processor for further service. The format of the 
exchange blocks and a description of the registers involved is given 
in Fig. 9. Once the status is set, the instructions are loaded into the 
32 instruction registers and execution commenced. All central processor 
memory references are made relative to the reference address, one of the 
status registers, permitting easy relocation. The upper boundary of 
memory protection is established by the sum of the reference address 
and the field or program length. The program address register P is an 
index relative to the reference address, and the P = 0 word is used for 
program exit condition storage. An exit feature permits programmer 


selection of the exit and stop conditions of the central processor. 
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7.3 Executive Control 

The 6600 provides an outstanding example of a hardware imple- 
mented Executive routine. The functions required by the Executive, as 
previously described in Section 3, are all handled to provide a virtual 
two level time-sharing system which utilizes an unusual exchange tech- 
nique. No actual transfer of programs is made, and the exchange jump 
instruction only saves and loads program status and environment infor- 
mation and switches control among programs. Similarly, the multiplexed 
arithmetic unit for the peripheral processors uses the ten independent 
memories, and the barrel and "slot" provides a method of switching 
operational control. A lower bound register and a computed upper 
bound memory protection scheme is used with relocation for the central 
processor provided by the reference address or lower bound register. 
7.4 Critique of the Hardware Oriented Method 

The only major fault is to be found is the lack of flexibility and 
general system complexity. Due to the hardware nature of the system 
Executive, all procedures are fixed, and no capability of adaptation to 
special users needs is apparent. To obtain the maximum benefit of the 
powerful techniques available, the programmer's job is made considerably 
more difficult, although this could be remedied by a comprehensive com- 
piler and system control, or "Monitor", program. Another weakness 
seems to be the relatively small memory. If the ten peripherals are 
presumed active, an average program length of only 13.2K is permitted. 


The effect of this constraint is, of course, dependent upon the job mix. 
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On the whole, the system does provide an impressive portent of future 
generation multiprogramming systems, and the methods used undoubtedly 


will influence greatly future time-sharing systems. 
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Se Simulation 

The use of a system simulation program provides another tech- 
nique for analyzing Exchange methods. Further, it permits investigation 
of the effect of varying certain basic parameters on various Exchange 
algorithms. A time-sharing system is a stochastic system and is 
amenable to the Monte Carlo method of analysis. The arrival of users 
and job characteristics can be treated in a probablistic sense and used 
to supply inputs to the system. The manner in which the various Exchange 
algorithms service these users can be used as a good approximation of 
actual performance in an operating system. Simulation provides a 
valuable secondary result in system planning. The coding required to 
simulate an Exchange algorithm approximates very closely the actual 
coding to be used in an operating system. Thus a measure of the com- 
plexity of the procedure is obtained, and, more importantly, the over- 
head attributed to the various methods can be closely approximated . 
Inasmuch as the overhead is a critical factor in the comparison of 
Exchange techniques, accurate determination of overhead time is 
invaluable. 

The general development of the system simulator, Program SIM, 
is covered in detail in Appendices I and II. This chapter will treat the 
Exchange portion of SIM specifically and will present and analyze the 
results of several simulation runs using different Exchange algorithms. 
The changing of either the Exchange technique or any of the system 


parameters permits many diverse systems to be investigated. The 
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system parameters include core, drum and disc storage capacity, 
number of users allowed and mean program size as well as the other 
job parameters. A computer with a three microsecond cycle time is 
assumed and transfer times computed on this basis. 
8.1 Simulator Exchange Routines 

The Exchange portion of SIM is divided into three major parts 
corresponding to the three basic system functions handled by the 
Exchange routine. These are the LOAD, QUIT and EXCHANGE operations. 
LOAD and QUIT bring the binary program from permanent storage, such 
as tape, to the external storage device used to handle operating programs 
and remove a completed program from this external store, respectively. 
Bounds are placed on the capacity of the external store, and NOLOAD 
conditions may arise due to lack of space. Figure 10 provides a flow 
diagram of the LOAD and QUIT routines. To permit completed programs 
to be saved if desired, upon receipt of a QUIT command the program is 
transferred from core to the external store before actual termination of 
the users service. Regardless of the Exchange algorithm used, the 
LOAD and QUIT operations are the same. The rearrangement or com- 
pacting of the external store was discussed in Chapter 5 and is not 
considered in SIM. It is felt that this is somewhat of a Dispatcher 
problem and would only degrade service slightly and have no effect on 
the comparison of Exchange or Scheduling algorithms. 

Rather than a general treatment of all the Exchange methods 


discussed, a system simulation was conducted ona specific system. 
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This approach was followed to provide a better demonstration of the use 
to which the simulator could be put. Also, with the basic hardware 
capabilities fixed, the effectiveness of the various Exchange methods 
under differing operating conditions could be studied. The effect on 
performance of varying single system parameters was also investigated. 
a.1.1 The basic simulated system configuration 

The system investigated utilized a drum for the external storage 
device and did not permit concurrent transfers. An average access time 
of 15 milliseconds was used for all read and write transfer operations , 
and, as previously mentioned, a cycle time of three microseconds was 
used. Relocatability was provided and memory protection limited to 2K 
blocks except in special cases where the departure from this is noted. 
The normal core size was 32K, but this is a variable input parameter. 
Bi .2 Exchange Method 1 

The first Exchange Algorithm is the basic complete program exchange 
discussed in Section 5.3. The flow chart of Method 1 is provided in 
Figure lla. As mentioned earlier, no program is transferred from core 
unless another user desires service. A check is first made to see if 
the program is in core and, if so, no transfer is required. If another 
user is in core, the normal dump, load and run sequence is implemented. 
The only memory protection required is the protection of the Executive 
Control Routine, and no relocatability is required. 

Basic runs were made with 10, 20, 30, 40 and 50 users witha 


single job, with the job size increasing on each run and with a ten-job, 
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widely mixed operating environment representing a typical computer 
center operation. The presentation and analysis of the data will be 
deferred until the latter part of this chapter to permit inclusion of all 
types and to allow comparisons to be made. 

2) ARS: Exchange Method 2 

Exchange Method 2, the flow chart of which is provided in 
Figure llb, is a very rudimentary Space allocation scheme. Programs 
are fitted into core one after another until core is full. At that time no 
selective dumping is attempted, and all programs in core are transferred 
to the external store and the build up restarted with the next user. Due 
to the requirement of checking memory bounds and preserving program 
environment of all programs transferred, the overhead is somewhat higher 
than for Method 1. 

Runs corresponding to those of Method 1 were made. In addition 
separate runs were conducted using the mixed job input on a system 
with single cell resolution in memory protection and on another with 
protection limited to 2K blocks. 

8.1.4 | Exchange Method 3 

The third and final Exchange algorithm, Method 3, is a complex 
space allocation type. Figure 12 provides a flow chart of the method. 
Core is filled sequentially until insufficient room exists for the next 
program. Several conditions are then tested with the aim of finding the 
smallest program that can be transferred to provide room, this minimizing 


transfer time. Memory protection is provided in 2K blocks. If no single 
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program can be transferred to allow the next user into core, including 
the combination of the last program sequentially plus the unused portion 
of core, the entire core is transferred. While various combinations of 
programs could be tested, the overhead required leads to a point of 
diminishing returns, and no combinations other than those using the 
unused portion of cone were tested. 
y 

Again basic runs corresponding to those of Method 1 were made. 
In addition, a run with the mixed job load, and a 64K core size was 
made. 
8.2 Simulator Output 

To evaluate the performance of the Exchange algorithms tested, 
certain basic quantities were recorded for each run. The first of these 
is Exchange Time which consists of the summation of access times, 
actual transfer times and Exchange overhead. The second value recorded 
is Exchange Overhead which consists of specific overhead and all access 
times. Total Entries is the number of times eHelEchonger was entered 
and an Exchange decision required. In Methods 2 and 3 Core Fits and 
Core Transfers are also recorded. Core Fits indicates the number of 
times programs were able to be inserted without transfer and Core Trans- 
fer, the number of times the complete core was transferred. The differ- 
ence between the sum of these two and the Total Entries in Method 2 
indicates the number of times the program required was already in core. 
Method 3 also records the number of Single Fits, or times a single 


program was transferred to make room for the new user. The second 


foal 


line of output is generally self-explanatory. Number of Users is the 
number of stations in operation. Users Serviced provides a count of 
the number of users loaded into the system during the run. Jobs 
Generated actually provides an indication of jobs completed as fifty 
jobs are originally generated to start the run, and any number greater 
than this indicates jobs completed. Average Job Size is the average size 
of all jobs exchanged. No Loads and Load Wait Time are determined by 
drum size. No Loads is the number of times a LOAD request was received, 
but no room existed on the drum, and Load Wait Time is the average time 
a user waited to be loaded after receiving a No Load. 
8.3 Simulation with Type 1 Inputs 

The first series of runs conducted operated with a typical job mix 
that could be expected in a time-sharing type system. A mean arrival 
time of 300 seconds and a mean load time of one second were assumed. 
The other job characteristics are shown in Figure 13. The presence of 
several small, highly dynamic jobs with a background of larger compute 


limited jobs typifies a normal computation center loading. 
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Job Active 1/0 Repeats Mean Job 


Type Time Time Size Probability 
] Sa0 Lt) 9.0 2000 Oa od 0) 
2 ae aru Yo" OU 4000 0.200 
3 520 1 30 9.0 6000 0.200 
4 2.0 2.0 30.0 8000 0.150 
5 2.0 1.0 30,0 12000 0.100 
6 a0 Ze0 PSD 16000 0.050 
7 Z0 16 12.0 20000 020510 
8 0 2.0 hs: 5 (0 24000 0.050 
9 22.0 Lab Wes 10. 28000 0.020 

10 1.0 lone) Toro 32000 0.030 


SIM JOB INPUT TABLE 


FIGURE 13 


A total of five different runs were made using this job mix as input. 
Each run consists of five individual one hour operating periods with 10, 
20, 30, 40 and 50 users permitted respectively. The first three runs 
consisted of the three basic methods with a 32K core. Run 4 used Exchange 
Method 2 but assumed a Single cell resolution for memory protection and 
transfers. Run 5 used Exchange Method 3 but allowed a 64K memory. 
2.3.1 Analysis of results 

The runs prove conclusively that as the mean core available per 
user becomes less than approximately thirty per cent of the mean job 
size, little advantage can be gained in the system under study, from 
more sophisticated space allocation Exchange techniques. When a 
large number of users are active, complete core transfers are required 
virtually every cycle, and the system behavior approaches that of the 
basic run and reload exchange regardless of the actual exchange 
method used. 
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The data from these runs is shown in Figures 14-18. The value 
of the single cell memory protection in this system appears to be 
marginal. The increased cost and timing problems created far outweigh 
the slight increase in performance. In comparing Figures 15 and 17 
(block and single cell respectively) only a 1% increase in Total Entries 
and a 2.5% decrease in Exchange Time for the single cell configuration 
can be noted. Although increasing core (Figure 18) greatly increases 
exchange efficiency for the ten user system, by the time thirty users 
are allowed ,the performance is only slightly improved over the 32K 
core Method 1 results (Figure 14). 

8.4 Simulation with Type 2 Inputs 

The second basic set of runs used a single job type, and, as the 
number of users was held constant, the mean program size was increased 
from 4K to 44K which resulted in average program size of approximately 
32K. After each size increasing phase, the mean was reset to 4K, the 
number of users incremented and the run repeated. This set again 
emphasizes the relation between mean job size and mean size available 
per user with respect to Exchange method performance. 

8.4.1] Analysis of results 

The results are presented in Figures 19-21 which are further 
separated into a-e runs, each with a fixed number of users. As in the 
preceding simulation, with a mixed load, comparisons should not be 
made entirely on the basis of Exchange Time, but rather should concen- 


~~ 


trate on Total Entries which is a measure of the actual number of 
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computation quanta allowed during the operating period. The results 
using the Type 2 inputs are as expected. Matching Figures 19, 20 
and 21 shows that the advantages of a particular method tend to dis- 
appear as the mean program size increases. As the number of uSers 
increases this crossover point occurs at a smaller mean size. 

The simulation conducted shows the practical value of sucha 
system design technique. Simulation provides valuable practical 
information on the relative value of hardware features such as memory 
and drum size and memory protection. Additionally, assistance is 
provided to the system programmer in the development of a suitable 


Executive Control Routine. 
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ae Conclusions 

This investigation provides a set of guidelines for the develop- 
ment of an Exchange technique for any system configuration. A general 
Exchange method can first be developed using heuristic analysis. Once 
the general approach applicable to the gross system has been determined 
a Simulation, such as the type conducted on the sample system, will 
provide valuable information on the selection of the specific method 
and point out worthwhile areas of hardware improvement. 

The investigation also points out the sensitivity of the Exchange 
techniques in general to small variations in the system configurations. 
It is impossible to define an optimal general Exchange method, as ex- 
changes are dependent on both hardware limitations and operating en- 
vironment. There are, however, six major parameters which, if defined, 
allow a general method to be selected. Minor parameters will modify 
the method slightly, but the approach provides a point of departure 
for the system designer. The parameters are: 

1) Mean Program Size - This is expressed in relation to the core 
available to operating programs and can be roughly quantized into three 
primary levels of interest; small - less than one-third core, medium - 
approximately on-half available core and large - greater than half the 
available core. 

2) Core Size - This is actual core available to operating programs, 
or the full core less than area required for the Executive. This can also 
be quantized as small - less than 32K, medium - 32 K to 64 K and large - 


above 64 K. 
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3) External Store - This is the type of external store available to 
the system. Typically, it consists of drum and/or disc, although any 
type of storage device such as tape or cartridge units is possible. 

4) Number of Simultaneous Transfer Channels - This is the number 
of channels available for transfers and includes any necessary logical 
separation of memory to provide concurrent transfers. 

5) Number of Stations - This is the maximum number of stations 
permitted to be in operation regardless of the type of service being 
received at a specific instant. 

6) Program Type - This is a definition of the expected job mix and 
is defined in terms of compute or I/0 limited jobs or jobs with large 
periods of man-machine communication. 

Multiprogramming is a technique that will receive greater attention 
in the future, and the potential of such systems is virtually unlimited. 
This is best shown by a simple example. Imagine a system capable 
of handling one hundred stations at a time, and users who require only 
four hours of station operating time per day. In addition, the time- 
sharing period of only eight hours per working day would be required 
to provide half the rental cost of the system. An $8,000,000 system 
could be provided at a cost to the on-line users of $6.00 per hour. An 
increased number of stations or a longer operating period for time-sharing 
could reduce the cost still further. This hourly rate makes the power of 
such a system available to the myriad of small users who otherwise 


could not afford such a capability. 


oy 





The full benefits of such a system cannot be realized without 
the best possible program exchange techniques. Although the hard- 
ware features will improve, the exchange method will still be required 
to do everything possible to compensate for the inherent slowness of 
transfer operations as compared to computation. In the next generation 
of systems, as in the present, the ability of a time-sharing system to 
meet its goal of more service to more users will be critically dependent 


on the performance of the Program Exchange Routine. 
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APPENDIX I 
SIM-A MULTIPROGRAMMING SYSTEM SIMULATOR 
Simulation has become a valuable analytic method that can be 
applied in diverse situations. The type of simulation of interest is the 
Monte Carlo Method which is defined in the McGraw Hill Encyclopedia 
of Science and Technology, as "A technique for estimating the solution, 
x, of a numerical mathematical problem by means of an artifical sampling 
experiment..... The method aptly fits the multiprogramming system 
problem and can produce worthwhile results. The required probability 
distributions associated with users can be determined by general data 
gathering and observation. The use of various algorithms in the Execu- 
tive routine and several hardware configurations in a simulated system 
subjected to a typical loading will produce the data to obtain a measure 
of effectiveness for the various hardware and software configurations. 
The time and expense required to actually evaluate each of the com- 
binations in an operating system are prohibitive and the use of simulation 
techniques provides the only realistic approach. 
Program SIM was developed as a general multiprogramming 
system simulator with the emphasis on the time-sharing type of 
environment. Due to the specific nature of the authors' theses, 
primary attention was given to the Scheduler and Exchanger. The 
normal performance of other specific areas of the Executive routine is 
assumed and these portions treated in a block method. A prime example 


of this is the Dispatcher. While it is a critical area of a multiprogramming 
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system no specific characteristics were assumed. When a user has an 
Input or Output the program assumes a waiting status until, due to the 
incrementing of the simulator clock, the action is deemed complete. 
The availability of the required I/0 equipment at all times is assumed. 
A time-sharing system is characterized by frequent man=-machine com- 
munication and buffered 1/0 is usually impossible due to the step by 
step nature of the system. However, simultaneous I/0 by all users, 
at least to their reactive typewriters, must be permitted. The gross 
Dispatcher treatment provides all this and only avoids the complica- 
tions of particular operations. If this area is studied in the future the 
Dispatcher portion could easily be made more detailed and added to the 
system simulator. 

The job load on the simulated system is created by a job 
generation subroutine (SET). Each job is characterized by six variables, 
which define any job entering the system. Arrival time, the first para- 
meter, is assumed to be exponentially distributed on the basis of 
queuing theory concepts and actual observation at System Development 
Corporation. A variable parameter is the mean arrival time expressed 
in seconds. The value of arrival time was determined by taking the 
natural logarithm of a uniformly distributed random number. The 
second parameter is Load time and represents the time required to 
transfer the binary program from its permanent storage to the temporary 
storage having access to the central memory. The next three parameters, 


Active time, I/0 time and Repeats define the actual program operating 
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characteristics. A program once loaded into the system is assumed 

to have an active period followed by an I/0 period, during which no 
service is required from the central processor. This cycle is repeated 
until the job is completed or there are no further repeats required. 

Due to the nature of SIM, differences in I/0 such as tape transfers, 
searches or outputs to reactive devices are not recognized and the I/0 
operations are grouped together. The sixth parameter, Size, completes 
the job description. Program size is limited to a maximum length of 
one hundred cells less than the full core available for operating pro- 
grams. The last five parameters are determined using a Gaussian 
Random number generator and the mean values received as input. A 
uniform random number generater is used to generate any of ten possible 
job types. The probability of each type job is received as an input. 

As soon as the job is completed, that is the number of repeats remaining 
is less than zero, a new job is generated for that station. An example 
of the input to SIM is shown at the end of the program contained in 
Figure I=-1. 

It is possible to obtain a wide variety of output parameters from 
the simulator as it has access to all of the internal system parameters 
conceming the operation of the system in question. These parameters 
may be gathered on a minute, average, total or maximum basis and 
thereby present a picture of the system's operation in almost any 


degree of detail desired. 
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For the purpose of comparison, the output parameters of one 
run, using a certain hardware and software configuration, may be 
saved and then presented with the results of another run with changes 
to the hardware and/or software configuration. The output of the simu- 
lator may be in a tabular format or modified to a graphical type format, 
which ever is deemed best for comparison purposes. 

The program operation is cyclic in nature. First, the initial jobs 
are generated and the program constants read in and/or initialized for the 
run. The main body of the simulator is then entered and the actual! run 
commenced. The clock is checked against arrival times of all allowed 
users (maximum 50) and an equality or greater than condition sets the 
action entries of the status table (STAT(X,Y)). The Scheduler then deter- 
mines which requests for service shall be honored and the order in which 
they shall be honored, i.e., queue information. The Scheduler also 
determines, from the number requesting service, the amount of time 
each user is allowed per cycle. The cycle begins with the formation 
of the queue and ends with termination of the last user's quantum. 

The Scheduler then passes control to the Exchanger. For further specific 
information concerning the operation of the Scheduler section of the 
simulator see Lt. W. G. Wilder’s thesis. 

The Exchanger determines the action required by the next user 
in the queue and then LOADS, QUITS or TRANSFERS the users program. 
The actual transfer algorithm is variable and the methods used are 


discussed in detail in Lt. R. R. Hatch’s thesis. Regardless of the 
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exact method, the required transfers are determined and the effective 
transfer times (TELOAD and TEDUMP) and exchange overhead calculated 
and added to the clock. In the LOAD and QUIT operations the size of 
the external store, such as a drum, is considered and no load or storage 
full conditions are possible. 

At the completion of a cycle all users in an I/0 status are handled 
by decrementing the remaining time by the elapsed cycle time. Users 
completing I/0 are checked for repeats. If repeats are necessary the 
program is reset to the active mode and if no repeats are required the 
program is terminated by a QUIT command in the next cycle. 

To avoid long idle periods a scheme is used to advance the clock 
to the next active clock time if there are no users desiring service. 

The smallest I/0 time remaining (SMALLA) and the nearest arrival time 
(SMALLB) are determined and the smaller of these two added to the 
clock and a new cycle commenced. 

A maximum clock parameter read in terminates the run and the 
capability for recycling is provided. All new parameters may be read 
in or the original parameters may be modified for successive runs. 

A flow diagram of SIM is contained in Figure I-2 and a copy of 
the actual program is contained in Appendix II. 

Due to fact that the thesis topics of Lt. W. G. Wilder and 
Lt. R. R. Hatch are both in the multiprogramming area and a generalized 
multiprogramming simulator could be used in each case, this simulator 


represents the joint efforts of both Lt. Wilder and Lt. Hatch. 
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APPENDIX III 
STATUS PRESERVATION IN THE 1604-160 
SATELLITE ENVIRONMENT 

The current 1604-160 Satellite System Control Programs do not 
provide for executive directed program exchange during a job run. 
Rather all switching between job input queries is performed by the 
Monitor program during the interjob interval. 

While program exchange as a technique will be of limited effective- 
ness until faster external storage devices are available (with present tapes, 
the exchange time would be 16 seconds), the status preservation technique 
necessary is of value. The following analysis indicates the nature of the 
status preservation problem associated with such an exchange and des- 
cribes methods of resolution. 

To provide the desired break-in capability, a priority approach 
should be followed. Upon receipt of a satellite request, the batch job 
would be halted, the environment preserved, and the program transferred. 
The remote user would then be serviced to completion and the batch job 
continued (Figure III-1). While the system provides little in the way of 
exchange techniques, it does allow a detailed investigation of the 
problems involved in environment preservation. 

The batch job can be interrupted at any time except when a transfer 
to peripheral equipment is actually in progress. This problem was handled 
previously by the use of a flag that could be set by the 1604 and sensed 


by the 160. This procedure was described by Lt. R. Hogg and Lt. D. 
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Glover in previous papers (14 and 15). Since then, an Interrupt Lock- 
out select code has been provided in the hardware. It is recommended 
that this select code be used during all transfer operations and no wait 
loop will be required in the 160. The Lockout code will be selected 
immediately before the transfer and deactivated immediately upon com- 
pletion. To preserve maximum system sensitivity this code must be 
selected and deselected on every pass through a repetitive loop. The 
160 interrupt will then remain on the line until the 1604 is free to service 
the request. 

Upon sensing a monitor type job satellite request interrupt the 
1604 will branch to the Exchange portion of Resident. It is this 
Exchanger that will preserve the present environment and initialize the 
Monitor Routine for the satellite job. As discussed in Section 5.2. the 
environment could be stored either in internal tables or with the program 
transferred to the external store. With only one user ever being exchanged 
the storage in internal tables would be most practical. A storage area 
could be provided immediately after Resident with only a slight change 
in Resident programming. 

The operation of the proposed Exchange routine is shown in the 
flow chart of Figure III-2. To permit the restoring of the interrupted 
job upon completion of the satellite job a SATJOB flag should be used. 
Upon sensing a job termination the Monitor routine would check the 
status of the SATJOB flag, if it were set, and hence a satellite job 


had just been completed, a branch to the Exchanger would be taken. 
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If the flag was not set, a normal batch job termination would be assumed 
and the normal flow of Resident followed. 

The double instruction per word format of the 1604 poses the 
only difficult status problem. At present, it is impossible to sense 
the status of the Interrupt Exit Flip Flop which provides the only indica- 
tion of whether the interrupt occurred on an upper or lower instruction. 
Normally, programs exit from the Interrupt Processor through the 
Interrupt Entry Location (Cell 07) and control is automatically returned 
to the proper half word. In the proposed satellite system, however, a 
new program would be injected into the system with new initial condi- 
tions. If the absolute requirement of status preservation is to meet, 
the status of this flip flop must be both preserved and be capable of 
being restored. Programs can be written to determine this status but 
they are time consuming. Once the status has been aeeniwed it can 
be reset by artificially creating a divide overflow from the half word 
desired. This causes an interrupt to be generated and with the suitable 
flag checking in the Interrupt Processor, control can be passed to the 
Exchanger with the Interrupt Exit Flip Flop properly set. A hardware 
modification has been designed to both sense and reset this and its 
implementation should be seriously considered if this type of operation 
is to become common. 

The transient portions of Resident can be stored in a 1K area of 
storage. While the following list is not complete it includes the main 


areas that must be preserved. These areas are also shown in Figure III-1. 
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1. Cell 00 - Cell 07. These contain the I/0 transfer control 

words and the interrupt exit. If cell 07 is preserved along with 

the upper or lower status of the Interrupt Exit Flip Flop, restoring 

these will provide reentry to the point at which the program was 

interrupted 

Due to the difference in Resident arrangements, the location of 
the following areas is not a fixed quantity. The approximate length is 
given after each area in octal notation. 

2. Control Information and Monitor Tape Assignment Table (20) 

3. Read Buffer (200) 

4. Write Buffer (230) 

5. Listable Output Buffer (20) 

6. Punched Card Buffer (20) 

7. Stacked Job Buffer (30) 

8. Program Start Table (40) 

9. Monitor Control Cells (10) 

These areas should be saved and then initialized prior to starting 
a satellite job. Due to the fact that a batch job has probably been 
interrupted a different output tape should be designated for the satellite 
job. A careful determination should be made of the other areas that 
require initial values to be reset. 

The actual saving andresetting should proceed in a logical fashion, 
preserving those quantities that will be changed by the Exchanger first. 


The tape to be used for the transfer can be either preset in the system 
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or brought in from the 160 as an input parameter. 

Due to the nature of the FORTRAN compiler it would be advisable 
to dump the entire core above Resident for all Exchanges. To reduce 
transfer times, Schemes to determine the portion of core actually in use 
should be investigated for use in later systems. As atime saving step, 
upon completion of any transfer the tape should be rewound immediately 
so that it is at the load point ready for the next transfer operation. 

This exchange technique will provide an invaluable start towards 
a more sophisticated multiprogramming system. The question of environ- 
ment preservation is a vital one and its solution will apply to any type 
of further system development. Availability of high speed transfer 
mediums and/or the use of space allocation will still require the 
perfect preservation methods developed through actual implementation 


of this proposal. 
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